Search CORE

22 research outputs found

Quasi-Newton Steps for Efficient Online Exp-Concave Optimization

Author: Gatmiry Khashayar
Mhammedi Zakaria
Publication venue
Publication date: 14/02/2023
Field of study

The aim of this paper is to design computationally-efficient and optimal algorithms for the online and stochastic exp-concave optimization settings. Typical algorithms for these settings, such as the Online Newton Step (ONS), can guarantee a

O(d\ln T)

bound on their regret after

T

rounds, where

d

is the dimension of the feasible set. However, such algorithms perform so-called generalized projections whenever their iterates step outside the feasible set. Such generalized projections require

\Omega(d^3)

arithmetic operations even for simple sets such a Euclidean ball, making the total runtime of ONS of order

d^3 T

after

T

rounds, in the worst-case. In this paper, we side-step generalized projections by using a self-concordant barrier as a regularizer to compute the Newton steps. This ensures that the iterates are always within the feasible set without requiring projections. This approach still requires the computation of the inverse of the Hessian of the barrier at every step. However, using the stability properties of the Newton steps, we show that the inverse of the Hessians can be efficiently approximated via Taylor expansions for most rounds, resulting in a

O(d^2 T +d^\omega \sqrt{T})

total computational complexity, where

\omega

is the exponent of matrix multiplication. In the stochastic setting, we show that this translates into a

O(d^3/\epsilon)

computational complexity for finding an

\epsilon

-suboptimal point, answering an open question by Koren 2013. We first show these new results for the simple case where the feasible set is a Euclidean ball. Then, to move to general convex set, we use a reduction to Online Convex Optimization over the Euclidean ball. Our final algorithm can be viewed as a more efficient version of ONS.Comment: First revision: presentation improvement

arXiv.org e-Print Archive

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
van Erven Tim
Publication venue
Publication date: 30/05/2019
Field of study

We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201

arXiv.org e-Print Archive

CWI's Institutional Repository

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
Publication venue
Publication date: 27/02/2020
Field of study

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using

O(d)

time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time

O(d^3)

per round. We then generalize two prior reductions to the unbounded setting; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work). We discuss their optimality in light of prior and new lower bounds. We apply our methods to obtain sharper regret bounds for scale-invariant online prediction with linear models.Comment: 30 Pages, 1 Figur

arXiv.org e-Print Archive

CWI's Institutional Repository

Adaptivity in Online and Statistical Learning

Author: Mhammedi Zakaria
Publication venue
Publication date: 01/01/2021
Field of study

Many modern machine learning algorithms, though successful, are still based on heuristics. In a typical application, such heuristics may manifest in the choice of a specific Neural Network structure, its number of parameters, or the learning rate during training. Relying on these heuristics is not ideal from a computational perspective (often involving multiple runs of the algorithm), and can also lead to over-fitting in some cases. This motivates the following question: for which machine learning tasks/settings do there exist efficient algorithms that automatically adapt to the best parameters? Characterizing the settings where this is the case and designing corresponding (parameter-free) algorithms within the online learning framework constitutes one of this thesis' primary goals. Towards this end, we develop algorithms for constrained and unconstrained online convex optimization that can automatically adapt to various parameters of interest such as the Lipschitz constant, the curvature of the sequence of losses, and the norm of the comparator. We also derive new performance lower-bounds characterizing the limits of adaptivity for algorithms in these settings. Part of systematizing the choice of machine learning methods also involves having ``certificates'' for the performance of algorithms. In the statistical learning setting, this translates to having (tight) generalization bounds. Adaptivity can manifest here through data-dependent bounds that become small whenever the problem is ``easy''. In this thesis, we provide such data-dependent bounds for the expected loss (the standard risk measure) and other risk measures. We also explore how such bounds can be used in the context of risk-monotonicity

The Australian National University

PAC-Bayesian Bound for the Conditional Value at Risk

Author: Guedj Benjamin
Mhammedi Zakaria
Williamson Robert C.
Publication venue
Publication date: 25/06/2020
Field of study

Conditional Value at Risk (CVaR) is a family of "coherent risk measures" which generalize the traditional mathematical expectation. Widely used in mathematical finance, it is garnering increasing interest in machine learning, e.g., as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the CVaR of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical CVaR is small. We achieve this by reducing the problem of estimating CVaR to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for CVaR even when the random variable in question is unbounded

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

UCL Discovery

HAL Descartes

PAC-Bayes Unexpected Bernstein Inequality

Author: Grünwald P.D. (Peter)
Guedj B. (Benjamin)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 01/12/2019
Field of study

We present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless Ln, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace Ln by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough n). Theoretically, unlike existing bounds, our new bound can be expected to converge to 0 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2 taken outside its expectation

CWI's Institutional Repository

PAC-Bayes Un-Expected Bernstein Inequality

Author: Grünwald Peter
Guedj Benjamin
Mhammedi Zakaria
Publication venue: HAL CCSD
Publication date: 09/12/2019
Field of study

International audienceWe present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless

L_n

, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace

L_n

by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough

n

). Theoretically, unlike existing bounds, our new bound can be expected to converge to

0

faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with